Confidence Interval

The next question to anticipate is "how do we know this (where do these numbers come from)?" The numbers are based on the theoretical sampling distribution of the mean. If we randomly sample 100 people, how representative will the mean of the sample be compared to the population mean? There will always be sampling error involved, and random sampling insures that the error will have a normal distribution. So if we randomly sampled 100 people 100 times and plot a frequency distribution, we would have a normal distribution of these sampling means: e.g., 95% of these means would fall with 1.96 standard deviations of the true population mean. If we increase the sample size to 200 and do the exercise again, we will increase both power and precision (but there will always be that internal trade off between the two) because there will be less sampling error (the opinion of 200 people will be more representative of the population than 100 people).

You don't need math - but you do need to have a feel for distributions. You can do this by running a couple of Monte Carlo simulations on your own: generate a 10 by 10 matrix of random numbers (say between 1 and 10) and take the means of each row. Plot these means in a frequency distribution. This is equivalent of randomly sampling 10 measures from a population of 100, 10 times. Like magic, you will see a normal distribution starting to form - but not one that you'd expect. Your raw data ranges from 1 to 10, but your sampling mean data will have a much smaller range, something like 4.5 to 5.5. Do it 2 or 3 times and you'll be able to take your distribution to a table of t-distributions and calculate power and precision for yourself. I've done this in classes I've taught: generate a 10 by 10 matrix on a piece of paper for each student. Have them calculate the means for each row (a couple of minute’s time). Then start plotting their results on the black board for everyone to see. The results will amaze your students - very powerful stuff. This is also a good way to introduce the concepts of standard deviations and variance without the math.

(From Gabriel) All the numbers are from the binomial distribution. With samples sizes of 100 and 1000, also the normal approximation to the binomial distribution can be used with average p (the proportion in the population) and standard deviation sqrt(n*p*(1-p)), where n is the sample size.

Then, if you want the 90% confidence interval, you search the p needed to have 5% of the area at the left (lower confidence interval limit) and the p value to leave 5% of the area at the right (upper confidence interval limit).

References

Griffiths, D., August 2008, Head First Statistics, O’Reilly Press.

Tutorials

Random Variables and their Probability Distributions

http://www.zweigmedia.com/RealWorld/tutstats/frames8_1.html

Probability - Additive, Conditional, Independent, Bayes Theorem topics

http://www.zweigmedia.com/RealWorld/Summary6.html

Bernoulli Trials and Binomial Random Variables

http://www.zweigmedia.com/RealWorld/tutstats/frames8_2.html

Home Page of Zweigmedia

http://www.zweigmedia.com/RealWorld/index.html

Stats Home
Stats Home Page